--- title: Analysis of Star Column keywords: fastai sidebar: home_sidebar summary: "Objective: " description: "Objective: " nb_path: "02_analysis_star.ipynb" ---
{% raw %}
{% endraw %}

Load Data

{% raw %}
df_sub.head(2)
Unnamed: 0 date_start location.district_label location.school_label cd_teacher_name cd_class cd_subject cd_girlspresent cd_boyspresent location.school_school_level co_star1 co_star1_example co_star2 co_star2_example co_wish co_wish_actions feedback_notes co_lpm_follow Pedagogy composite score (max of 24) co_star co_star_clean co_wish_clean feedback_notes_clean co_star_clean_lemma co_wish_clean_lemma feedback_notes_clean_lemma co_star_pos_lemma co_wish_pos_lemma feedback_notes_pos_lemma year month
0 0 2019-10-16 Bo Kaku A Community Junior Secondary School Moses Simbo jss1 math 12.0 10.0 Junior Secondary Teacher involved pupils in to the lesson Teacher involved pupils in the lesson during t... Teacher avoid flogging Teacher did not flog pupils during the lesson. teacher did not scanned the class during the o... SSO engaged and encourage the teacher to pleas... teacher did not scanned the class during the o... 2.0 13 teacher involved pupils in to the lesson. teac... teacher involved pupils lesson teacher involve... teacher scanned class opening lesson sso engag... teacher scanned class opening lesson teacher involve pupil lesson teacher involve p... teacher scan class opening lesson sso engage e... teacher scan class opening lesson {'NOUN': ['teacher', 'pupil', 'lesson', 'pract... {'NOUN': ['teacher', 'class', 'opening', 'less... {'NOUN': ['teacher', 'class', 'opening', 'less... 2019 10.0
1 1 2020-02-12 Western Area Urban Baptist Junior Secondary School Mr David Kamara jss1 english 25.0 23.0 Junior Secondary Content The teacher is comfortable and in control of t... Assessment The teacher uses different methods of assessin... to be more audible in the next lesson. feedbac... Feedback session with teacher after observation NaN 4.0 5 content. the teacher is comfortable and in con... content teacher comfortable control content ta... audible lesson feedback session teacher observ... NaN content teacher comfortable control content te... audible lesson feedback session teacher observ... NaN {'NOUN': ['content', 'teacher', 'control', 'as... {'PART': 'to', 'VERB': 'be', 'ADV': 'more', 'A... {'NOUN': 'nan'} 2020 2.0
{% endraw %}

Visualize Word Cloud of Star

Visialise word cloud of top 50 words of cleaned star field, this would probably give us a overall understanding of what are the most common words being used over the years for filling up Star column of lesson feedback.

In this we have skipped following words after initial trial as they are known to be present in the dataset and will not add much insight for us. Skipped words are:{"teacher","student","students","lesson","class","pupil","pupils"}

{% raw %}

return_wc[source]

return_wc(x, max_word=50, facecolor='k', bg_color='white')

{% endraw %} {% raw %}

plot_wc[source]

plot_wc(x, max_word=50, facecolor='k', title='None', bg_color='white')

{% endraw %} {% raw %}
{% endraw %} {% raw %}
co_star_clean = ' '.join(df_sub['co_star_clean'].tolist())
plot_wc(co_star_clean)
{% endraw %}

Visualize Word Cloud of different Part of Speech

Since we have done part of speech tagging it would be interesting to see word cloud for different part of speech, we would try to see word cloud of Verbs, Nouns, Adjectives, Adverbs seprately, which woudl probably add more insight.

{% raw %}

set_key[source]

set_key(dictionary, key, value)

{% endraw %} {% raw %}

separate_pos[source]

separate_pos(df)

{% endraw %} {% raw %}
{% endraw %}

Top 50 Verbs of Star Field

{% raw %}
plot_wc(verbs)
{% endraw %}

Top 50 NOUN of Star field

{% raw %}
plot_wc(noun)
{% endraw %}

Top 20 ADJECTIVE of Star field

{% raw %}
plot_wc(adj,20)
{% endraw %}

Top 20 ADVERB of Star field

{% raw %}
plot_wc(adv,20)
{% endraw %}

Bigram and Trigram of words

Word cloud is good for understanding single word emphasis, but if we want to understand phrase in sentence Bigram/Trigram method is good. So we will be able to understand which are the most common phrases used in star column.

{% raw %}

calculate_ngram[source]

calculate_ngram(df, col='co_star_clean')

{% endraw %} {% raw %}
{% endraw %} {% raw %}
plt.figure(figsize=(10,15))
sns.barplot(df.frequency,df['bigram'])
plt.show()
{% endraw %}

Word Net Visualisation

Many words are common accross Bigram, so let us see network graph of them

{% raw %}

visulaizeBigrams[source]

visulaizeBigrams(bigram_df, K)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
visulaizeBigrams(bigram_df=df, K=12)
{% endraw %}

Varriation of rating with Bigrams/Trigrams

Let us now see how the ratings (co_lpm_follow) varry with star bigram

{% raw %}
fig, axs = plt.subplots(10,5, figsize=(25, 40), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace = .5, wspace=.25)
axs = axs.ravel()

for i in range(len(df)):
    bigram = df.bigram[i]
    test_df = df_sub[df_sub['co_star'].str.contains(bigram)]    
    ser = test_df.groupby('co_lpm_follow').count()['cd_class']
    try:
        sns.barplot(x=ser.index,y=ser.values,ax = axs[i])
    except:
        pass
    axs[i].set_title(bigram)
{% endraw %}

Varriation of STAR between years

How the star varry over the years - 2018 to 2020?

What are the top most words of STAR for yearly data

{% raw %}

plot_multi_wc[source]

plot_multi_wc(n_topic, word_clouds, texts)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
   
plot_multi_wc(len(years),wcs,texts)
{% endraw %}

What the top POS: Verbs, Nouns, Adverbs, Adjectives in yearly data

Top 20 Verbs between years

{% raw %}
texts = ["Top 20 VERBS words of {}".format(i) for i in years]
plot_multi_wc(len(years),verbs,texts)
{% endraw %}

Top 20 Nouns between years

{% raw %}
texts = ["Top 20 NOUNS words of {}".format(i) for i in years]
plot_multi_wc(len(years),nouns,texts)
{% endraw %}

Top 20 Adjectives between years

{% raw %}
texts = ["Top 20 Adjectives words of {}".format(i) for i in years]
plot_multi_wc(len(years),adjs,texts)
{% endraw %}

Top 20 Adverbs between years

{% raw %}
texts = ["Top 20 Adverbs words of {}".format(i) for i in years]
plot_multi_wc(len(years),advs,texts)
{% endraw %}

Variation of Bigram/Trigram between the years

How does Bigram/Trigram varry over the years??

{% raw %}

plot_bigram[source]

plot_bigram(bigrams, texts)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
texts = ["Top 50 Bigrams/Trigram of {}\n".format(i) for i in years]
plot_bigram(bigrams,texts)
{% endraw %}

Word Net of Bigram/Trigram between years

{% raw %}

visulaizeBigrams_multi[source]

visulaizeBigrams_multi(bigram_dfs, texts, K)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
texts = ["Network of Bigrams/Trigram of {}".format(i) for i in years]
visulaizeBigrams_multi(bigrams,texts,K=10)
{% endraw %}